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Abstract. Multiplayer games with selfish agents naturally occur in the design 
of distributed and embedded systems. As the goals of selfish agents are usually 
neither equivalent nor antagonistic to each other, such games are non zero-sum 
games. We study such games and show that a large class of these games, includ- 
ing games where the individual objectives are mean- or discounted-payoff, or 
quantitative reachability, and show that they do not only have a solution, but a 
simple solution. We establish the existence of Nash equilibria that are composed 
of k memoryless strategies for each agent in a setting with k agents, one main 
and fc — 1 minor strategies. The main strategy describes what happens when all 
agents comply, whereas the minor strategies ensure that all other agents immedi- 
ately start to co-operate against the agent who first deviates from the plan. This 
simplicity is important, as rational agents are an idealisation. Realistically, agents 
have to decide on their moves with very limited resources, and complicated strate- 
gies that require exponential — or even non-elementary — implementations cannot 
realistically be implemented. The existence of simple strategies that we prove in 
this paper therefore holds a promise of implementability. 



1 Introduction 

The construction of correct and efficient computer systems (both hard- and software) 
is recognised to be an extremely difficult task. Formal methods have been exploited 
with some success in the design and verification of such systems. Mathematical logic, 
automata theory fTl\, and model-checking [T2l have contributed much to the success 
of formal methods in this field. However, traditional approaches aim at systems with 
qualitative specifications like LTL, and rely on the fact that these specifications are 
either satisfied or violated by the system. 

Unfortunately, these techniques do not trivially extend to complex systems, such as 
embedded or distributed systems. A main reason for this is that such systems often con- 
sist of multiple independent components with individual objectives. These components 
can be viewed as selfish agents that may cooperate and compete at the same time. It is 
difficult to model the interplay between these components with traditional finite state 
machines, as they cannot reflect the intricate quantitative valuation of an agent on how 
well he has met his goal. In particular, it is not realistic to assume that these components 



are always cooperating to satisfy a common goal, as it is, e.g., assumed in works that 
distinguish between an environment and a system. We argue that it is more realistic to 
assume that all components act like selfish agents that try to achieve their own objec- 
tives and are either unconcerned about the effect this has on the other components or 
consider this effect to be secondary. It is indeed a recent trend to enhance the system 
models used in the classical approach of verification by quantitative cost and gain func- 
tions, and to exploit the well established game-theoretic framework Ii21i22j for their 
formal analysis. 

The first steps towards the extension of computational models with concepts from 
classical game theory were taken by advancing from boolean to general two-player 
zero-sum games played on graphs ifTSl . Like their qualitative counterparts, those games 
are adequate to model controller-environment interaction problems M24I25I . As usual 
in control theory, one can distinguish between moves of a control player, who plays 
actions to control a system to meet a control objective, and an antagonistic environment 
player. In the classical setting, the control player has a qualitative objective — he might, 
for example, try to enforce a temporal specification — whereas the environment tries 
to prevent this. In the extension to quantitative games, the controller instead tries to 
maximise its gain, while the environment tries to minimise it. This extension lifts the 
controller synthesis problem from a constructive extension of a decision problem to a 
classical optimisation problem. 

However, this extension has not lifted the restriction to purely antagonist interac- 
tions between a controller and a hostile environment. In order to study more complex 
systems with more than two components, and with objectives that are not necessar- 
ily antagonist, we resort to multiplayer non zero-sum games. In this context, Nash 
equilibria |21| take the place that winning and optimal strategies take in qualitative 
and quantitative two-player games zero-sum games, respectively. Surprisingly, quali- 
tative objectives have so far prevailed in the study of Nash equilibria for distributed 
systems. However, we argue that Nash equilibria for selfish agents with quantitative 
objectives — such as reaching a set of target states quickly or with a minimal consump- 
tion of energy — are natural objectives that aught to be studied alongside (or instead of) 
traditional qualitative objectives. 

Consequently, we study Nash equilibria for multiplayer non zero-sum games played 
on graphs with quantitative objectives. 

Our contribution. In this paper, we study turn-based multiplayer non zero-sum 
games played on finite graphs with quantitative objectives, expressed through a cost 
function for each player {cost games). Each cost function assigns, for every play of the 
game, a value that represents the cost that is incurred for a player by this play. Cost func- 
tions allow to express classical quantitative objectives such as quantitative reachability 
(i.e., the player aims at reaching a subset of states as soon as possible), or mean-payoff 
objectives. In this framework, all players are supposed to be rational: they want to min- 
imise their own cost or, equivalently, maximise their own gain. This invites the use of 
Nash equilibria as the adequate concept for cost games. 

Our results are twofold. Firstly, we prove the existence of Nash equilibria for a large 
class of cost games that includes quantitative reachability and mean-payoff objectives. 
Secondly, we study the complexity of these Nash equilibria in terms of the memory 



needed in the strategies of the individual players in these Nash equilibria. More pre- 
cisely, we ensure existence of Nash equilibria whose strategies only requires a number 
of memory states that is linear in the size of the game for a wide class of cost games, 
including games with quantitative reachability and mean-payoff objectives. 

The general philosophy of our work is as follows: we try to derive existence of 
Nash equilibria in multiplayer non zero-sum quantitative games (and characterization 
of their complexity) through determinacy results (and characterization of the optimal 
strategies) of several well-chosen two-player quantitative games derived from the mul- 
tiplayer game. These ideas were already successfully exploited in the qualitative frame- 
work lfT6l . and in the case of limit-average objectives [26]. 

Related work. Several recent papers have considered two-player zero-sum games 
played on finite graphs with regular objectives enriched by some quantitative aspects. 
Let us mention some of them: games with finitary objectives [lOJ, mean-payoff parity 
games ifTTI . games with prioritised requirements 1 1 1, request-response games where the 
waiting times between the requests and the responses are minimized II 18128 1, games 
whose winning conditions are expressed via quantitative languages ||2l, and recently, 
cost-parity and cost-Streett games 1 13|. 

Other work concerns qualitative non zero-sum games. In 1 16 j, general criteria ensur- 
ing existence of Nash equilibria and subgame perfect equilibria (resp. secure equilibria) 
are provided for multiplayer (resp. 2-player) games, as well as complexity results. The 
complexity of Nash equilibria in multiplayer concurrent games with Biichi objectives 
has been discussed in |5|. [41 studies the existence of Nash equilibria for timed games 
with qualitative reachability objectives 

Finally, there is a series of recent results on the combination of non zero-sum as- 
pects with quantitative objectives. In |3|, the authors study games played on graphs 
with terminal vertices where quantitative payoffs are assigned to the players. In ||T9l . 
the authors provide an algorithm to decide the existence of Nash equilibria for concur- 
rent priced games with quantitative reachability objectives. In [23], the authors prove 
existence of a Nash equilibrium in Muller games on finite graphs where players have a 
preference ordering on the sets of the Muller table. Let us also notice that the existence 
of a Nash equilibrium in cost games with quantitative reachability objectives we study 
in this paper has already been established in [7] . The new proves we provide are simpler 
and significantly improve the complexity of the strategies constructed from exponential 
to linear in the size of the game. 

Organization of the paper In Section |2] we present the model of multiplayer cost 
games and define the problems we study. The main results are given in Section [3] Fi- 
nally, in Section m we apply our general result on particular cost games with classical 
objectives. Omitted proofs and additional materials can be found in the Appendix. 

2 General Background 

In this section, we define our model of multiplayer cost game, recall the concept of 
Nash equilibrium and state the problems we study. 

Definition 1. A multiplayer cost game is a tuple Q ~ {II, V, {Vi)i^n,E, (Costi).;^//) 
where 



• n is a finite set of players, 

• G = {V, E) is a finite directed graph with vertices V and edges E QV x V , 

• {Vi)i£n is a partition ofV such that Vi is the set of vertices controlled by player i, 
and 

• Costj : Plays — ^ RU {+00, — cxi} is the cost function of player i, where Plays is the 
set of plays in Q, i.e. the set of infinite paths through G. For every play p G Plays, 
the value Costi(p) represents the amount that player i loses for this play. 

Cost games are multiplayer turn-based quantitative non zero-sum games. We assume 
that the players are rational: they play in a way to minimise their own cost. 

Note that minimising cost or maximising gain are essentially^ equivalent, as max- 
imising the gain for player i can be modelled by using Costj to be minus this gain 
and then minimising the cost. This is particularly important in cases where two players 
have antagonistic goals, as it is the case in all two-player zero-sum games. To cover 
these cases without changing the setting, we sometimes refer to maximisation in order 
to preserve the connection to such games in the literature. 

For the sake of simplicity, we assume that each vertex has at least one outgoing 
edge. Moreover, it is sometimes convenient to specify an initial vertex ^ V of the 
game. We then call the pair [Q^ wg) an initialised multiplayer cost game. This game is 
played as follows. First, a token is placed on the initial vertex v^. Whenever a token is 
on a vertex v E Vi controlled by player i, player i chooses one of the outgoing edges 
{v,v') G E and moves the token along this edge to v'. This way, the players together 
determine an infinite path through the graph G, which we call a play. Let us remind that 
Plays is the set of all plays in Q. 

A history ft, of C/ is a finite path through the graph G. We denote by Hist the set of 
histories of a game, and by e the empty history. In the sequel, we write h — ho . . . hk, 
where ho, . . . ,hk € V (k G N), for a history h, and similarly, p — popi . . where 
po,pi, . . . G V, for a play p. A prefix of length n + 1 (for some n G N) of a play p = 
popi ... is the finite history po • ■ • Pn- We denote this history by p[0, n]. 

Given a history h = ho . . . hk and a vertex v such that {hk,v) G E, we denote by hv 
the history ho . ■ . hkV. Moreover, given a history h = ho . ■ . hk and a play p — popi ■ . ■ 
such that {hk,po) G E, we denote by hp the play ho . . . hkPoPi 

The function Last (resp. First) returns, for a given history h — ho . . . hk, the last 
vertex hk (resp. the first vertex ho) of h. The function First naturally extends to plays. 

A strategy of player z in C/ is a function cr : Hist — ^ y assigning to each history h G 
Hist that ends in a vertex Last(/i) G Vi controlled by player i, a successor v — (j{h) 
of Last(ft.). That is, (Last(/i), (T(ft)) G E. We say that a play p — poPi ... of is 
consistent with a strategy a of player i if pk+i = c(po ■ ■ ■ Pk) for all A: G N such that 
pk G Vi. A strategy profile of is a tuple {ai)i^n of strategies, where refers to a 
strategy for player i. Given an initial vertex v, a strategy profile determines the unique 
play of {Q, v) that is consistent with all strategies Ci. This play is called the outcome 
of {ai)i^n and denoted by {{(Ti)i^n)v We say that a player deviates from a strategy 
(resp. from a play) if he does not carefully follow this strategy (resp. this play). 



Sometimes the translation implies minor follow-up changes, e.g., the replacement of liminf 
by lim sup and vice versa. 



A finite strategy automaton for player i ^ U over a game Q — (77, V, {Vi)i£n, 
E, (Costi)ig77) is a Mealy automaton Ai = (M, mo, V, 6, v) where: 

- M is a non-empty, finite set of memory states, 

- mo G M is the initial memory state, 

- 5 : M xV M is the memory update function, 

- V : M xVi ^ V IS the transition choice function, such that (f , v{rn, v)) G E for 
all m e M and u G 14- 

We can extend the memory update function Stoa function S* : M x Hist — s> M defined 
by S*{m,e) = m and 5*{ni,hv) = d{d*{m,h),v) for all m e M and hv G Hist. 
The strategy a^t computed by a finite strategy automaton Ai is defined by cr^. = 
1^(5* (mo, /i), "y) for all hv G Hist such that v G K;. We say that cr is a finite -memory 
strategy if there exist^ a finite strategy automaton ^ such that a — a a- Moreover, we 
say that a — (Jj( has a memory of size at most \M\, where \M\ is the number of states 
of A. In particular, if \M\ = 1, we say that cr is a positional strategy (the current vertex 
of the play determines the choice of the next vertex). We call {ai)i^n ^ strategy profile 
with memory m if for all i G 77, the strategy ai has a memory of size at most m. A 
strategy profile {(Ji)ii=n is called positional ox finite-memory if each ai is a positional 
or a finite-memory strategy, respectively. 

We now define the notion of Nash equilibria in this quantitative framework. 

Definition 2. Given an initialised multiplayer cost game (Q, Vq), a strategy profile {ai)i^n 
is a Nash equilibrium in [Q^vq) if, for every player j £ 11 and for every strategy a'j of 
player j, we have: 

Costj{p) < Costj(p') 
where p ^ {{(Tt)ien)vo and p' = (Cj, o•^e77\{J})«o■ 
This definition means that, for all j G 77, player j has no incentive to deviate from 
aj since he cannot strictly decrease his cost when using a'j instead of aj. Keeping 
notations of Definition |2] in mind, a strategy a'j such that Costj{p) > Costj(p') is 
called a profitable deviation for player j w.r.t. {ai)i(zn- 

Example 3. Let Q — (77, V, Vi,V2, E, Costi , Cost2) be the two-player cost game whose 
graph G = {V,E) is depicted in Figure [T| The states of player 1 (resp. 2) are repre- 
sented by circles (resp. squaresjl Thus, according to Figure [T] Vi ~ {A,C,D} and 
V2 = {73}. In order to define the cost functions of both players, we consider a price 
function tt : E ^ {1,2,3}, which assigns a price to each edge of the graph. The 
price functior@ tt is as follows (see the numbers in Figure [TJ: n(A, B) — 7r(73, A) = 
■k{B, C) = 1, 7r(A, 7?) = 2 and 7r(C, B) = 7r(D, B) = 3. The cost function Costi of 
player 1 expresses a quantitative reachability objective: he wants to reach the vertex C 

* Note that there exist several finite strategy automata such that a — aA- 
' We will keep this convention through the paper. 

* Note that we could have defined a different price function for each player. In this case, the 
edges of the graph would have been labelled by couples of numbers. 



(shaded vertex) while minimising the sum of prices up to this vertex. That is, for every 
play p = papi ...ofQ: 

Cost f ) — / ^^=1 '^(P'i--^' Pi) if i^ least index s.t. p„ = C, 
[+00 otherwise. 

As for the cost function Cost2 of player 2, it expresses a mean-payoff objective: the cost 
of a play is the long-run average of the prices that appear along this play. Formally, for 
any play p = popi ...oiQ: 

1 " 

Cost2(/3) = limsup - • ^ 7r(pi_i,pi). 

Each player aims at minimising the cost incurred by the play. Let us insist on the fact 
that the players of a cost game may have different kinds of cost functions (as in this 
example). 




An example of a play in Q can be given by p = (AS)", leading to the costs 
Costi(p) = +00 and Cost2(/o) = 1. In the same way, the play p' = A{BC)'^ induces 
the following costs: Costi(p) = 2 and Cost2(/o) = 2. 

Let us fix the initial vertex at the vertex A. The play p = {AB)" is the outcome of 
the positional strateg >|!| profile (ci, cr2) where cti{A) = B and a2{B) = A. Moreover, 
this strategy profile is in fact a Nash equilibrium: player 2 gets the least cost he can 
expect in this game, and player 1 has no incentive to choose the edge {A, D) (it does 
not allow the play to pass through vertex C). 

We now consider the positional strategy profile {a[,(T'2) with (j[{A) = B and 
(t'2{B) — C. Its outcome is the play p' = A{BC)'^ . However, this strategy profile is 
not a Nash equilibrium, because player 2 can strictly lower his cost by always choosing 
the edge (B, ^4) instead of (i?, C), thus lowering his cost from 2 to 1. In other words, 
the strategy (T2 (defined before) is a profitable deviation for player 2 w.rt. {cr'^ , CTj). 

The questions studied in this paper are the following ones: 

Problem 1 Given a multiplayer cost game Q, does there exist a Nash equilibrium in Q ? 

Problem 2 Given a multiplayer cost game Q, does there exist a finite-memory Nash 
equilibrium in Q ? 



^ Note that player 1 has no choice in vertices C and D, that is, ai{hv) is necessarily equal to B 
for V € {C, D} and h £ Hist. 



Obviously enough, if we make no restrictions on our cost games, the answer to 
Problem [T] (and thus to Problem |2]i is negative (see Example |4]l. Our first goal in this 
paper is to identify a large class of cost games for which the answer to Problem [T] is 
positive. Then we also positively reply to Problem |2] for subclasses of the previously 
identified class of cost games. Both results can be found in Section|3] 

Example 4. Let {Q, A) be the initialised one-player cost game depicted below, whose 
cost function Costi is defined by Costi(A"i?'^) = i for n e No and Costi(A") 
+00. One can be convinced that there is no Nash equilibrium in this initialised game. 

& 

In order to our class of cost games, we need the notions of Min-Max cost games, 
determinacy and optimal strategies. The following two definitions are inspired by ll27l . 

Definitions. A Min-Max cost game w a fwpte ^ = (V, VMm, Vmoi, -E, CostMm, Gain^ar), 
where 

• G = (y, E) is a finite directed graph with vertices V and edges E (ZV k V , 

• (VmiVi, Vmaa) is a partition of V such that V/nin (resp. V/nax) is the set of vertices 
controlled by player Min (resp. Max), and 

• CostA/m : Plays — ^ M U {+00, —00} is the cost function of player Min, that repre- 
sents the amount that he loses for a play, and GainMai : Plays — ?> M U {+00, —00} 
is the gain function of player Max, that represents the amount that he wins for a 
play. 

In such a game, player Min wants to minimise his cost, while player Max wants to 
maximise his gain. So, a Min-Max cost game is a particular case of a two-player cost 
game. Let us stress that, according to this definition, a Min-Max cost game is zero-sum 
if CostMin = GainMax, but this might not always be the cas^B- We also point out that 
Definition |5] allows to take completely unrelated functions CostMin and GainMax, but 
usually they are similar (see Definition [Tsl l. In the sequel, we denote by Sum (resp. 
-S^Max) the set of strategies of player Min (resp. Max) in a Min-Max cost game. 

Definition 6. Given a Min-Max cost game Q, we define for every vertex v ^ V the 
upper value Val* {v) as: 

Var(w) = inf sup CostM,«((cri, CT2)i,) , 

and the lower value Val, (v) as: 

VaU(w) = sup inf G3\nMax{{cri, (T2)v) ■ 

The game Q is determined if, for every v d V, we have Val*(u) = VaU(w). In this 
case, we say that the game Q has a value, and for every v ^ V, Val(v) — Val*(v) = 

^ For an example, see the average-price game in DefinitionllSI 



VaU(w). We also say that the strategies a'^ £ SMinOnda2 G i^Max optimal strategies 
for the respective players if, for every v E V, we have that 

inf GainMfl.v((CTi,cr2)„) = Val(w) = sup CostMin{{crt , (^2) v) ■ 

If crj is an optimal strategy for player Min, then he loses at most Val(f ) when playing 
according to it. On the other hand, player Max wins at least Val(i;) if he plays according 
to an optimal strategy a2 for him. 

Examples of classical determined Min-Max cost games can be found in SectionH] 

3 Results 

In this section, we first define a large class of cost games for which Problem [T] can be 
answered positively (Theorem [TOb. Then, we study existence of simple Nash equilib- 
ria (Theorems [T3] and [T4b. To define this interesting class of cost games, we need the 
concepts of cost-prefix-linear and coalition-determined cost games. 

Definition 7. A multiplayer cost game Q — {11, V, {Vi)i^n , E, (Costi)igj7) is cost- 
prefix-lmeeLr if for every player i € 77, every vertex v G V and history hv G Hist, there 
exists a G M and b G such that, for every play p G Plays with First(p) = v, we 
have: 

Costi{hp) ~ a + b ■ Costi(p) . 

Let us now define the concept of coalition-determined cost games. 

Definition 8. A multiplayer cost game Q = (17, F, (l^)ig77, E, {QosU)ien) is (positio- 
nally/finite-memory) coalition-determined//; /or every p/ayeri G n, there exists a gain 
function Gain^^^. : Plays — M U {+00, —00} such that 

- Cost.; > Gain]^^., and 

— the Min-Max cost game ~ (V, Vi, V \ Vi, E, Costi, Gain^^^.), where player i 
(player Min) plays against the coalition U \ {i\ (player Max), is determined and 
has (positional/finite-memory) optimal strategies for both players. That is: 3 a* G 
Smui, 3crlj G Umox (both positional/finite-memory) such that \/v G V 

inf Gain|^^^((cri,cr* = Var(w) = sup Costi((cr*, o-_j)^,) . 

Given i E U, note that Q'^ does not depend on the cost functions Costj, with j ^ i. 

Example 9. Let us consider the two-player cost game Q of Example [3] where player 1 
has a quantitative reachability objective (Costi) and player 2 has a mean-payoff objec- 
tive (Cost2). We show that Q is positionally coalition-determined. 

Let us set Gainj^ax = Costi and study the Min-Max cost game — (V, Vi, V2, 
E, Costi, Gain^ax)^ where player Min (resp. Max) is player 1 (resp. 2) and wants to 
minimise Costi (resp. maximise Gainj^ax)- This game is positionally determined P 27I14I . 
We define positional strategies aj" and ct* 1 for player 1 and player 2, respectively. 



in the following way: <Ti{A) = B and a*_^{B) = A. From A, their outcome is 
(K,cr*i))A = {ABY, and Costi((ylB)") = G3^\n]^^^{{ABY) = +00. One can 
check that the strategies cr* and are optimal in Q^. Note that the positional strat- 
egy a\ defined by ct* [A] = Dis also optimal (for player l)mQ^. With this strategy, we 
have that ((ct^, 0-1 1)) A = {AD B)'^, and CostiHADB)'^) = Gamli^^{{ADB)'^)= +00. 

We now examine the Min-Max cost game Q"^ = (F, V2, Vi, i?, Cost2, Gain^^^), 
where Gain^ax is defined as Cost2 but with liminf instead of limsup. In this game, 
player Min (resp. Max) is player 2 (resp. 1) and wants to minimise Cost2 (resp. max- 
imise Gain^ax)- This game is also positionally determined 1127114 1. Let '^-2 
be the positional strategies for player 2 and player 1, respectively, defined as follows: 
a*{B) = C and 0-12 (^) = D. From A, their outcome is ((ctJ, cr* 2))^ AD{BCY, 
and Cost2(AL>(BC)") = Gz\nli^^{AD{BCY) = 2. We claim that and cr* 2 are the 
only positional optimal strategies in Q^. 

TheoremfTolpositivelv answers Problem[T|for cost-prefix-linear, coalition-determined 
cost games. 

Theorem 10. In every initialised multiplayer cost game that is cost-prefix-linear and 
coalition-determined, there exists a Nash equilibrium. 

Proof. Let [Q — (U, V, {Vi)i^n,E, {Costi)i(^n),VQ) be an initialised multiplayer cost 
game that is cost-prefix-linear and coalition-determined. Thanks to the latter property, 
we know that, for every i e 77, there exists a gain function Gainji^^x such that the Min- 
Max cost game — {V,Vi,V \ Vi,E, Cost,;, Gaml^^^J is determined and there exist 
optimal strategies cr* and cr* ^ for player i and the coalition 77 \ {i} respectively. In 
particular, for j ^ i, we denote by cr* ^ the strategy of player j derived from the strategy 
crlj of the coalition 77 \ {i}. 

The idea is to define the required Nash equilibrium as follows: each player i plays 
according to his strategy a* and punishes the first player j ^ i who deviates from his 
strategy cr*, by playing according to a* j (the strategy of player i derived from a* ^ in 
the game Q^). 

Formally, we consider the outcome of the optimal strategies {a*)i^n from vq, and 
set p := {{o'i)ien)vo- We need to specify a punishment function P : Hist ^ 77 U {±} 
that detects who is the first player to deviate from the play p, i.e. who has to be punished. 
For the initial vertex vq, we define P{vo) = 1. (meaning that nobody has deviated from 
p) and for every history hv G Hist, we let: 



Then the definition of the Nash equilibrium {Ti)i(zn in Q is as follows. For alH G 77 
and h e Hist such that Last(/i) e Vi, 




i if P{h) = ±, hv is not a prefix of p, and Last(/i) £ Vi, 
P{h) otherwise (P(h) ^ _L). 




if P{h) = ± or i, 
ih) otherwise. 



Clearly the outcome of {Ti)i^n is the play p {— {{a*)i^n)vo)- 



Now we show that the strategy profile {Ti)i^n is a Nash equilibrium in Q. As a con- 
tradiction, let us assume that there exists a profitable deviation rj for some player j S 
n. Wedenoteby p' := (rj, {Ti)i(:n\{j})vo the outcome where player j plays according 
to his profitable deviation rj and the players of the coalition 77 \ {j} keep their strate- 
gies {Ti)ien\{j}- Since rj is a profitable deviation for player j w.r.t. {Ti)i^n,^Q have 
that: 

Costj(p') < Costj(p). (1) 

As both plays p and p' start from vertex vq, there exists a history hv G Hist such 
that p = h{{Ti)i^n)v and p' — h{Tj, {Ti)i^n\{j})v (remark that h could be empty). 
Among the common prefixes of p and p', we choose the history hv of maximal length. 
By definition of the strategy profile {Ti)i^n, we can write in the case of the outcome p 
that p = h{{a*)i(zn)v Whereas in the case of the outcome p', player j does not follow 
his strategy cr* any more from vertex v, and so, the coalition 77 \ {j} punishes him by 
playing according to the strategy cr*^ after history hv, and so p' = h{T'pa*_j)v (see 
Figure |2|i. 




Fig. 2. Sketch of the tree representing the unraveUing of the game Q from vq. 

Since cr* ^ is an optimal strategy for the coalition 77 \ {j} in the determined Min- 
Max cost game , we have: 

VaP(w)= inf Gain^3^((crj,a* 

< Gain^^^((Tj,a*^.)„) 

<Cost,((Tj,a*^.},). (2) 

The last inequality comes from the hypothesis Costj > Gain^^^ in the game 

Moreover, the game Q is cost-prefix-linear, and then, when considering the history 
hv, there exist a G M and h e R+ such that 

Costj (/?') = Costj {h{T'^ , cr* ^■)^,) = a + b- Costj ((rj , ct* . (3) 

As & > 0, Equations (|2]l and (|3]l imply: 

Costj(/9') >a + 6-VaP(w). (4) 



Since h is also a prefix of p, we have: 



Costj(/9) = Costj{h{{a*)ien)v) 



a + b- Costj{{{a*)ifzn)v) ■ 



(5) 




> Costj{{{<j*),en)v) ■ (6) 
Then, Equations (|5]l and (|6]l imply: 

Costj (p) < a + 6- VaP (w). (7) 
Finally, Equations (|4|i and O lead to the following inequality: 

Costj(p) < a + 6 • VaP(w) < Costj{p') , 



which contradicts Equation ([T]i. The strategy profile {Ti)ii=n is then a Nash equilibrium 



Remark 11. The proof of Theorem[TO]remains valid for cost functions Cost; : Plays — > 
K, where {K, +, •, 0, 1, <) is an ordered field. This allows for instance to consider non- 
standard real costs and enjoy infinitesimals to model the costs of a player. 

Example 12. Let us consider the initialised two-player cost game A) of Example[3] 
where player 1 has a quantitative reachability objective (Costi) and player 2 has a mean- 
payoff objective (Cost2). One can show that Q is cost-prefix-linear. Since we saw in Ex- 
ample |9]that this game is also positionally coalition-determined, we can apply the con- 
struction in the proof of Theorem[TO]to get a Nash equilibrium in Q. The construction 
from this proof may result in two different Nash equilibria, depending on the selection 
of the strategies ct* i, (T2 ™<i "'-2 defined in Example|9] 

The first Nash equilibrium (ri,T2) with outcome p — (aJ'jCrJ)^ = A{BC)'^ is 
given, for any history h, by: 



where the punishment function P is defined as in the proof of Theorem[TO]and depends 
on the play p. The cost for this finite-memory Nash equilibrium is Costi {p) — 2 = 
Cost2(yo). 

The strategy fi of the second Nash equilibrium (f 1 , T2 ) with outcome p — {a* , (72 ) a = 
AD{BC)'^ is given by fi{hA) :— D for all history h. The cost for this finite-memory 
Nash equilibrium is Costi (p) = 6 and Cost2(yo) = 2, respectively. 

Note that there is no positional Nash equilibrium with outcome p (resp. p). 

The two following theorems provide results about the complexity of the Nash equi- 
librium defined in the latter proof. Applications of these theorems to specific classes of 
cost games are provided in Section |4] 



in the game Q. 



□ 




Theorem 13. In every initialised multiplayer cost game that is cost-prefix-linear and 
positionally coalition-determined, there exists a Nash equilibrium with memory (at 
most) \V\ + \n\. 

Theorem 14. In every initialised multiplayer cost game that is cost-prefix-linear and 
finite-memory coalition-determined, there exists a Nash equilibrium with finite memory. 

The proofs of these two theorems rely on the construction of the Nash equilibrium 
provided in the proof of TheoremfTOl 

4 Applications 

In this section, we exhibit several classes of classical objectives that can be encoded in 
our general setting. The list we propose is far from being exhaustive. 

4.1 QuaUtative Objectives 

Multiplayer games with qualitative (win/lose) objectives can naturally be encoded via 
multiplayer cost games; for instance via cost functions Cost^ : Plays — > {l,+oo}, 
where 1 (resp. +oo) means that the play is won (resp. lost) by player i. Let us now 
consider the subclass of qualitative games with prefix-independenu Borel objectives. 
Given such a game Q, we have that Q is coalition-determined, as a consequence of the 
Borel determinacy theorem ll20l . Moreover the prefix-independence hypothesis obvi- 
ously guarantees that Q is also cost-prefix-linear (by taking a = and h ~ 1). By 
applying Theorem (TO] we obtain the existence of a Nash equilibrium for qualitative 
games with prefix-independent Borel objectives. Let us notice that this result is already 
present in lfT6l . 

When considering more specific subclasses of qualitative games enjoying a posi- 
tional determinacy result, such as parity games ifTSl . we can apply Theorem [T3] and 
ensure existence of a Nash equilibrium whose memory is (at most) linear. 

4.2 Classical Quantitative Objectives 

We here give four well-known kinds of Min-Max cost games and see later that they are 
determined. For each sort of game, the cost and gain functions are defined from a price 
function (and a reward function in the last case), which labels the edges of the game 
graph with prices (and rewards). 

Definition 15 (||27l). Given a game graph G = {V, VMin, Vmojc, E), a price function tt : 
i? — )• M that assigns a price to each edge, a divergin^^ reward function : _B — > M that 
assigns a reward to each edge, and a play p = papi . . . in G, we define the following 
Min-Max cost games: 

' An objective J7 C is prefix-independent if only if for every play p — popi ... £ , we 
have that p G ^7 iff for every n G N, pnpn+i . . . £ O. 

For all plays p = poPi ... in G, it holds that lim„_»oo | Yl^^i ''^{Pi-ij Pi)\ ~ +oo. This is 
equivalent to requiring that every cycle has a positive sum of rewards. 



( i) a reachability -price game is a Min-Max cost game Q = [G, RPuin , RPmox) together 
with a given goal set Goal C V, where 



7t{p[0, n\) ifn is the least index s.t. pn G Goal, 
+00 otherwise. 



with 7r(p[0, n]) = X^ILi i^iPi-i, Pi); 
(ii) a discounted-price game is a Min-Max cost game Q — {G , DP MinW ^ DP MaxW) 
together with a given discount factor A G ]0, 1 [, where 

+00 

DPMin{\){p) = DPMax{\){p) = (1 - A) • ^ A*" ^ 7r(p,_ i , ) ; 

i=l 

(Hi) an asexage-^xice g&m^^is a Min-Max cost game Q — {G,APMm,APMax), where 

APMin (p) = lim sup ^' and APmuk ip) = Um inf ' -'^ ; 

n^+oo n JH-+00 n 

(iv) a price-per-reward-average game is a Min-Max cost game Q — {G,PRAvgnfi^, 
PRAvg^^^), where 

PRA^gMin ip) - lim sup ^^7^^ and PRAvg^^^{p) = lim inf ^^^^THT ' 

withd{p[Q,n]) =Yli^i"d{pi-i,Pi)- 

An average-price game is then a particular case of a price-per-reward-average game. 
Let us remark that, in Example [3] the cost function Costi (resp. Cost2) corresponds 
to RPMin with Goal = {C} (resp. APMin)- The game Q"^ (resp. Q^) of Example|9]is a 
reachability-price (resp. average-price) game. 

The following theorem is a well-known result about the particular cost games de- 
scribed in Definition [T5] 

Theorem 16 ( II27I14I ). Reachability-price games, discounted-price games, average- 
price games, and price-per-reward games are determined and have positional optimal 
strategies. 

This result implies that a multiplayer cost game where each cost function is RPMin, 
DPMin, APMin or PRAvgjy[j^ is positionally coalition-determined. Moreover, one can 
show that such a game is cost-prefix-linear. Theorem[T2]then follows from Theorem[T3] 

Theorem 17. In every initialised multiplayer cost game Q — (n,V, {Vi)i£n , 
(Costi)ig77) where the cost function Costi belongs to {RP Min, DP Min, AP uim 
PRAvg/^iij} for every player i G 77, there exists a Nash equilibrium with memory (at 
most) \V\ + |77|. 



" When the cost function of a player is APMin, we say that he has a mean-payoff objective. 



Note that the existence of finite-memory Nash equilibria in cost games with quantita- 
tive reachabiUty objectives has already been established in 07181 . Even if not explicitly 
stated in the previous papers, one can deduce from the proof of fS" Lemma 16] that 
the provided Nash equilibrium has a memory (at least) exponential in the size of the 
cost game. Thus, Theorem [17] significantly improves the complexity of the strategies 
constructed in the case of cost games with quantitative reachability objectives. 

4.3 Combining Qualitative and Quantitative Objectives 

Multiplayer cost games allow to encode games combining both qualitative and quanti- 
tative objectives, such as mean-pay ojf parity games 1 11 1. In our framework, where each 
player aims at minimising his cost, the mean-payoff parity objective could be encoded 
as follows: Cos\.i{p) — APum{p) if the parity condition is satisfied, +oo otherwise. 

The determinacy of mean-payoff parity games, together with the existence of opti- 
mal strategies (that could require infinite memory) have been proved in ifTTI . This result 
implies that multiplayer cost games with mean-payoff parity objectives are coalition- 
determined. Moreover, one can prove that such a game is also cost-prefix-linear (by 
taking a = and 6 = 1). By applying Theorem (TO] we obtain the existence of a Nash 
equilibrium for multiplayer cost games with mean-payoff parity objectives. As far as 
we know, this is the first result about the existence of a Nash equilibrium in cost games 
with mean-payoff parity games. 

Remark 18. Let us emphasise that Theorem[TO]applies to cost games where the players 
have different kinds of cost functions (as in Example[3]l. In particular, one player could 
have a qualitative Biichi objective, a second player a discounted-price objective, a third 
player a mean-payoff parity objective,. . . 
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Technical Appendix 



A Example of a cost game which is not cost-prefix-linear 

Example 19. Multiplayer cost games allow to encode energy games. Let be a cost 
game defined by means of a price function tt : i? — > R, that assigns a price to each 
edge. In our framework, where each player aims at minimising his cost, an energy 
objective |6| (with threshold T G M) could be encoded as follows: 



sup„>o7r(p[0,n]) 

+ oo 



Costj(p) 
with7r(p[0,ri]) = ElLi '^(P^-i' 



if sup„>o7r(p[0,n]) < T 
otherwise. 




Fig. 3. A cost game which is not cost-prefix-linear 



Let us consider the one-player cost game with an energy objective (with threshold 
T — 2) depicted in Figure [3] We show that this game is not cost-prefix-linear. For 
this, we exhibit a history hv e Hist such that for all a, & e M there exists a play 
p e Plays with First(/9) — v, such that Costi{hp) a + b ■ Costi(/3). We in fact 
give a play p independent of a and b. Let hv be the history AAABA and p be the play 
[ABy. We have that Costi(/9) = 1 and Costi(/ip) = Costi{AA{ABY) = +oo, since 
sup„>Q 7r((/i/9)[0, n]) = 3, which is above the threshold T = 2. It is thus impossible to 
find a, & e M such that: 

+00 = Costi{hp) = a + b ■ Costi(/3) = a + b. 



B Remark about secure and subgame perfect equilibria 

Remark 20. It would be tempting to try to prove the existence of subgame perfect equi- 
libria or secure equilibria^ in multiplayer cost games with techniques similar to the 
proof of Theorem [To] However, our definition of the Nash equilibrium in the proof of 
Theorem[TO]is (in general) neither a subgame perfect equilibrium, nor a secure equilib- 
rium. To see this, let us consider the following two cost games Q and T-L, whose graphs 
are depicted on Figure|4]and|5]respectively. Both games are initialised in vertex A. 

The game Q is, a. two-player cost game where the vertices of player 1 (resp. 2) are 
represented by circles (resp. squares), that is, Vi = {B, C, D, E, F} and V2 — {A}. 



The definitions of subgame perfect and secure equilibria in this context can be found in (91 



Fig. 4. Game g. 



Fig. 5. Game-H. 



The cost functions of both players are RPmih, withT^ Goali = Goal2 — {D, E} and the 
price function tt : i? — > M defined by 7r(e) = 1 for any edge e E E (same price function 
for the two players). It means that both players have reachability objectives and want to 
reach vertex D ox E within the least number of edges. 

Let us study the two Min-Max cost games Q"^ and Q^. In the game Q^, let ct* be 
defined as (T*(C) ~ E and a*_i be defined as a*_^{A) = C. Then, a\ and a*_^ are 
positional optimal strategies for player Min (player 1) and player Max (player 2) re- 
spectively. In the game Q^, we define and cr*2 as (A) — B and (T*2(C) = F. 
These two strategies of are positional optimal strategies for player Min (player 2) 
and player Max (player 1) respectively. 

If we define a Nash equilibrium (ri, T2) in Q exactly as in the proof of TheoremfTOl 
depending on these strategies ct*, <t*_^, 0-2 and (7*_2, then (ti, T2) is not a subgame per- 
fect equilibrium in t/. Indeed, {ti\atT2\a) isnotaNash equilibrium in the subgame 
with history AC: player 1 punishes player 2 by choosing the edge (C, F) (according to 
cr* 2) whereas player 1 could pay a smaller cost by choosing the edge (C, E). 

Furthermore, this Nash equilibrium also gives a counter-example of subgame per- 
fect equilibrium for other classical punishments (see |22|, e.g., punish the last player 
who has deviated and only for a finite number of steps). 

Let us now consider the two-player cost game Ti. where Vi ~ {A, B} and V2 = 
{C, D} (see Figure|5]l. The price function and the cost functions of the two players are 
the same as in the game Q, except that here Goali = {A, C} and Goab — {C}. Note 
that player 2 does not really play in %, only player 1 has a choice to make: he can 
choose the edge (_B, C) or the edge {B, D). 

As before, we study the two Min-Max cost games and T-P. Let a\ be a positional 
strategy of player 1 in 'h} such that cr^(i?) — C, and (T*2 be a positional strategy of 
player 1 in Ti.^ such that cr* 2 (^) = These strategies are optimal in the two respective 
games. Then, we define a Nash equilibrium in % in the same way as in the proof of 
Theorem [TOl depending on a\ and a*_2- Actually, this is not a secure equilibrium in 
Ti because player 1 can strictly increase player 2's cost while keeping his own cost, 
by choosing the edge [B, D) instead of following a\ (a\ suggests to choose the edge 
{B,C)). 



In both figures, shaded (resp. doubly circled) vertices represent the goal set Goali (resp. 
Goab). 



C Proof of Theorem M 



Theorem [T3] states that in every initiahsed muhiplayer cost game that is cost-prefix- 
hnear and positionally coalition-determined, there exists a Nash equilibrium with mem- 
ory (at most) \V\ + |iT|. 

Proof. Let [Q — (U, V, {Vi)i^n,E, (Costi)ig/7), wq) be an initialised multiplayer cost 
game that is cost-prefix-linear and positionally coalition-determined. For this proof, we 
keep the notations introduced in the proof of Theorem [TO] In particular, we consider 
the Nash equilibrium {Ti)i^n as defined in the latter proof, whose outcome is p := 
{{a*)i^n)vo- We recall that for all i e 77, the strategy t,; depends on the strategies cr* 
(optimal strategy in Q^) and a* ^ (derived from the optimal strategy cr* ^ in Q^) for j G 
77 \ {i}. As the game Q is now positionally coalition-determined by hypothesis, these 
strategies are assumed to be positional. This proof consists in showing that {Ti)i(zn is a 
strategy profile with memory (at most) \ V\ + |77|. 

For this purpose, we define a finite strategy automaton for each player that remem- 
bers the play p and who has to be punished. As the play p is the outcome of the po- 
sitional strategy profile {(J*)i^n, we can write p := Vi^ . . .Vk-i{vk ■ ■ ■Vn)'^ where 
G < k < n < \V\, vi G V for all < Z < n and these vertices are all different. 
For any i G 11, let Ai = (M, mo, V, 5, v) be the strategy automaton of player i, where: 

- A7 = {voVt^.VoVi, . . . ,Vn-lVn,VnVk}^ n \ 

As we want to be sure that the play p is followed by all players, we need to mem- 
orise which movement (edge) has to be chosen at each step of p. This is the role 
of {wqi^O: '^o^^i: • • ■ 7 Vn-iVn, VnVk}- But in case a player deviates from p, we only 
have to remember this player during the rest of the play (no matter if another player 
later deviates from p). This is the role of 77 \ {i}. 

- mo — vqVo (this memory state means that the play has not begun yet). 

- (5 : M X — !> A7 is defined in this way: given m G M and v G V, 



6(rn, v) 



j if m = j £ 77 or 

(m = U1U2, with Ml, U2 E V, V =/= U2 and ui G Vj), 
vivi+i if m = uvi for a certain / G {0, . . . ,n — I}, u E V, 

and V — vi, 
VnVk Otherwise (m = uvn and v = v„). 



Intuitively, m represents either a player to punish, or the edge that should, if fol- 
lowing p, have been chosen at the last step of the current stage of the play, and v is 
the real last vertex of the current stage of the play. 

Notice that in this definition of S, j is different from i because if player i follows the 
strategy computed by this strategy automaton, one can be convinced that he does 
not deviate from the play p. 

1/ : M X Vi ^ V is defined in this way: given to e M and v ^Vi, 

<J*{v) if TO = U1M2 with ui, M2 G and w = M2, 
v{m, v) := ^ if TO = j e 77 or 

(m = M1U2, with ui, U2 eV,v^U2 and ui G Vj). 



The idea is to play according to a* if everybody follows the play p, and switch to 
a* j if player j is the first player who has deviated from p. 

Obviously, the strategy aj,. computed by the strategy automaton Ai exactly corre- 
sponds to the strategy of the Nash equilibrium. And so, we can conclude that each 
strategy t; requires a memory of size at most |il/|<|iT| + |y|. □ 

D Example |3] continued 

Example 21. Thanks to the proof of Theorem [T3l we can construct a finite strategy 
automaton Ai that computes the strategy ri of player 1 given in Example [121 The set 
M of memory states is M = {AA, AB, BC,CB} U {2} since p = A(BC)'^, and 
the initial state is mg = AA. The memory update function S : M x V ^ M and the 
transition choice function : M x Vi ^ V aie depicted in Figure|6] a label v/v' on an 
edge (nil, 7712) means that S{mi,v) ~ m2, and i/{mi,v) = v' if v d V\. If w ^ V\, we 
indicate that v does not return any advice by a '— ', and label the edge with vj 




Fig. 6. The finite strategy automaton ^1 . 



E Sketch of proof of Theorem [H 

Theorem [14] states that in every initialised multiplayer cost game that is cost-prefix- 
linear and finite-memory coalition-determined, there exists a Nash equilibrium with fi- 
nite memory. 

Proof (Sketch). The proof follows the same philosophy than the proof of Theorem [T3] 
and keeps the same notations. Again we consider the Nash equilibrium {Ti)i^n defined 
in the proof of Theorem [TOl whose outcome is p := {{Ci )ien)vo- We recall that for 
all i € n, the strategy depends on the strategies a* and a* ,^ for j € U \ {i}. As 
the game Q is finite-memory coalition-determined by hypothesis, these strategies are 
assumed to be finite-memory. Given i G U and j G 77 \ {i}, we denote by A"^' (resp. 
A'^^-i) a finite strategy automaton for the strategy a* (resp. cr*j). 

As in the proof of Theorem[T3] each player needs to remember both the play p and 
who has to be punished. But here the play p is not anymore the outcome of a positional 
strategy profile: each <t* is a finite-memory strategy. Nevertheless, in some sense, we 
can see the (7*'s as positional strategies played on the product graph G x A?^ x • • • x 



. This allows us to write p := vq . . . Vk-i{vk ■ ■ ■ Vn)'^ wher43 < fc < n < 
l''^! ■ rijGTT l-^"^' \,vieV for all < / < n. Like in the proof of Theorem [T3] we 
can now define, for any i ^ 11, A^', a finite strategy automaton for t,;. In order to build 
explicitly , we need to take into account, on one hand, the path p, and on the other 
hand, the memory of the punishing strategies a* j . This enables to bound the size of A^' 

^y\V\■U,e^\■^^H+E,e^\i^}\■^<'\■ ' □ 
F Remark on the particular Min-Max cost games of Definition [15] 

Remark 22. Note that reachability-price and discounted-price games are zero-surrQ 
games, whereas the two other ones are not. For example, let us consider the average- 
price game Q depicted on Figure [T] The vertices of this game are A and B, and the 
number or 1 associated to an edge corresponds with the price of this edge (tt{A, B) = 
tt{B, B) = 1 and the price of the other edges is zero). 








Fig. 7. Average-price game Q. 

Let p be the play AB AB'^ B'^ A'^ . . . B"^" A^" . . ., where A" means the concatena- 
tion of i A. Then the sequence of prices appearing along p is lOPO^l^O'' ... 1^ 0^ . . ., 
and so we get: APMin(p) = f and APMax(/3) = 5- As these costs are not equal, the 
average-price game Q depicted on Figure [T] is not a zero-sum game. Since an average- 
price game is a special case of price-per-reward-average game, we can conclude that 
these two kinds of games are non zero-sum games. 

G Part of the proof of Theorem [17] 

Proposition 23. Let Q = (77, V, (T^)iei7, (Costi)igj7) be a multiplayer cost game 
where the cost function Costi belongs to {RPMin, OPMim^^PMim PRAvg^^j^} for each 
i G n. Then the game Q is cost-prefix-linear and positionally coalition-determined. 

Proof. Let C? be a a multiplayer cost game where each cost function is RPMin, DPivim, 
APiviin or PRAvg,y[;^. Let us first prove that the game Q is cost-prefix-linear Given j G 
n,vEV and hv G Hist, we consider the four possible cases for Costj. Let tt : E ^M. 
be a price function and : — > M be a diverging reward function. For the sake of 
simplicity, we write hv := ho . . . hk with k E N, hk = v and hi E V for I = 0, . . . , k. 

|.4| denotes the number of states of the automaton A. 

Let us recall that a Min-Max cost game is zero-sum if and only if CostMin = Gainmax. 



Moreover, to avoid heavy notation, we do not explicitly show the dependency between 
Goal and j in the first case or between A and j in the second case. 



(i) Case Costj = RPMin for a given goal set Goal C V: 

Let us distinguish two situations. If there exists I <= {0, . . . ,k} such that hi £ Goal, 
then we set a := J2i=i ""(^i-i) ^i) € M and b := € R"*", where n is the least 
index such that /i„ G Goal. Let p be a play with Flrst(p) = v, then it implies 
that RPMin(/ip) = X^"=i 7r(/ii_i, /ij) = a + b ■ RPMin (p) (with the convention that 
• +00 = 0). 

If there does not exist I G {0, . . . , fc} such that hi G Goal, then we set a := 
5^i=i ^{hi-i, hi) E R and b := 1 E R+. Let p = popi ... be a play such that 
First(p) = V. If RPMin(p) is infinite, then RPMin(/ip) = +oo = a + b ■ RPMin(p)- 
Otherwise, if n is the least index in N such that p„ G Goal, then we have that: 

k n 
RPMm(M = ^7r(/li_i,/li) + y^7r(pi-i,pi) 
i=l i=l 
= + 6-RPMin(p)- 

(ii) Case Costj = DPMin(A) for a given discount factor X E ]0, 1[: 

We set a := (1 - A) J^i^i A*" /i^) G R and 6 := A'^ G R+. Given a play 

P = PoPi ■ ■ ■ such that Flrst(p) = v and r] := hp E Plays (with jj = riorn . . .), we 
have that: 

DPMm(A)(M =DPMin(A)(r?) 

+ CXD 

k +00 

= (l-A)^A^-i7r(%_i,j?i) + (l-A) Yl >^'~'^{Vi-i,Vi) 

k +00 

= (1 - A) ^ A^- /li) + A'=(l - A) ^ y-\{pi.uP^) 
= a + 6-DPMin(A)(p). 

(iii) Case Costj = APMin: 

We set a := e R and b := 1 E R+. Given p E Plays such that First(p) = v and 
7] := hp E Plays (with rj = r]or]i . . .), we show that: 

APuUhp) = APmUv) = APMin(/5) • 

If APMinC??) — APMin(p) = +00 or — oo, the desired result obviously holds. Oth- 
erwise, let us setccn := ^YH=i^{Vt-i,Vi) Vn ■= ^ E"=i 7r(pi-i, Pi)' for all 
n G Nq. By properties of the limit superior and definition of the APmui function, it 
holds that: 

limsup(3;„ - j/„) > APMin(?7) - APMm(p) > liminf(a; 

n Vn) ■ 



It remains to prove that the sequence (a;„ — yn)neti converges to 0. For all n > k, 
we have that: 



Vn 



As the absolute value is bounded independently of n (let us remind that E is finite), 
we can conclude that (a;„ — yn)nefi converges to 0, and so APMm(»/) — APMm(p)- 
(iv) Case Costj = PRAvg^in: 

We set a := e R and & 1 G K+. Given p e Plays such that First(p) = v and 
T] := hp ^ Plays (with i] = rypr/i . . .), we show that: 

PRAvg^.Jhp) = PRAvg^,„(r,) = PRAvg^,,„(p) . 

Thanks to several properties of lim sup, we have that: 

DDA f \ r Er=l '^(Pi-l: Pi) 

PRAvgMi„(p) = hmsup ' 

— mii&up „ 

n-!-+oo 1^1=1 v[rik+i-i,r]k+i) 

= lim sup Eg^^fa-i,^0-Eti^fa-i,^0 
«^+oo ^^'^^ ?9(77,_i,?7i) -X;.*=i'^('7«-i,'7i) 

= hm sup — ^j— (8) 

«^+oo ^^^^ ^^(?yj-i,»7») - Ej=i ^(^i-i''7») 



lim sup 



= lim sup (9) 

= lim sup t::^ 

= PRAvgM,„(r7) = PRAvgMi„(/ip) . 

Line (O comes from the fact that the reward function d is diverging, and from the 
following property: if lim„_j.+oo bn = b e K, then limsup„^_|_o^(a„ + 6„) = 
(limsup„^^oo a„) + b. Line ^ is implied by this property: if lim„^+oo bn = 
b> 0, then limsup„^+oc(a„ • 6„) = (limsup„^^o^ a„) ■ b. 

Note that, if the history h is empty, then fc = and, in all cases, a is equal to and b to 
1. This actually implies that Costi{hp) — Costi(p) holds. 

Let us now prove that the game G is positionally coalition-determined. Given a 
player i G 77, if Costi = RPMin, then we take GainJ^jax — RPiviax- We do the same for 
the other cases by defining the gain function GainJ^jax for the coalition as the counterpart 



of Costi in Definition [T5] Clearly, it holds that Cost^ > GainJ^^^^. Moreover, the Min- 
Max cost game = {V,Vi,V\Vi, E, Costi, GainMax) is determined and has positional 
optimal strategies by Theorem[T6] □ 



